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Abstract. Self-Organizing Maps are models for unsupervised representation for- 
mation of cortical receptor fields by stimuli-driven self-organization in laterally 
coupled winner-take-all feedforward structures. This paper discusses modifications 
of the original Kohonen model that were motivated by a potential function, in their 
ability to set up a neural mapping of maximal mutual information. Enhancing the 
winner update, instead of relaxing it, results in an algorithm that generates an 
infomax map corresponding to magnification exponent of one. Despite there may 
be more than one algorithm showing the same magnification exponent, the mag- 
nification law is an experimentally accessible quantity and therefore suitable for 
quantitative description of neural optimization principles. 

Self-Organizing Maps are one of most successful paradigms in mathemat- 
ical modelling of special aspects of brain function, despite that a quantitative 
understanding of the neurobiological learning dynamics and its implications 
on the mathematical process of structure formation are still lacking. A biolog- 
ical discrimination between models may be difficult, and it is not completely 
clear [T| what optimization goals are dominant in the biological development 
for e.g. skin, auditory, olfactory or retina receptor fields. All of them roughly 
show a self-organizing ordering as can most simply be described by the Self- 
Organizing Feature Map [5] defined as follows: 

Every stimulus in input space (receptor field) is assigned to a so-called 
winner (or center of excitation) s where the distance \v — w s \ to the stimulus 
is minimal. According to Kohonen, all weight vectors are updated by 

Sw r = r)g rs ■ (v — w r ). (1) 

This can be interpreted as a Hebbian learning rule; r\ is the learning rate, and 
g rs determines the (pre-defined) topology of the neural layer. While Kohonen 
chose g to be 1 for a fixed neighbourhood, and elsewhere, a Gaussian kernel 
(with a width decreasing in time) is more common. The Self-Organizing Map 
concept can be used with regular lattices of any dimension (although 1, 2 or 
3 dimensions are preferred for easy visualization), with an irregular lattice, 
none (Vector Quantization) [3 , or a neural gas [I] where the coefficients g 
are determined by rank of distance in input space. In all cases, learning can 
be implemented by serial (stochastic), batch, or parallel updates. 
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1 The potential function for the discrete-stimulus 
non-border-crossing case 

One disadvantage of the original Kohonen learning rule is that it has no 
potential (or objective, or cost) function valid for the general case of contin- 
uously distributed input spaces. Ritter, Martinetz and Schulten [5] gave for 
the expectation value of the learning step (F s denotes the voronoi cell of s) 

K) = ^p(/) 9 ; sM (/^ r ) (2) 

= vJ29rs E p(v»)(v*-Wr) = -TlW{{ll>}) 
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the following potential function 
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This expression is however only a valid potential in cases like the end-phase 
of learning of Travelling-Salesman-type optimization procedures, i.e. input 
spaces with a discrete input probability distribution and only as long as the 
borders of the voronoi tesselation F s ({w}) are not shifting across a stimulus 
vector (Fig. [T]), which results in discontinuities. 




Fig. 1. Movement of Voronoi borders due to the change of a weight vector w r . 

As Kohonen pointed out [6], for differentiation of ([3]) w.r. to a stimulus 
vector w r , one has to take into account the movement of the borders of the 
voronoi tesselation, leading to corrections to the oringinal SOM rule by an 
additional term for the winner; so |3|) is a potential function for the Winner 
Relaxing Kohonen rather than for SOM. This approach has been generalized 
[7] to obtain infomax maps, as will be discussed in Section f?] 
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2 Limiting cases of WRK potential and Elastic Net 

For a gaussian neighbourhood kernel 

^ s ~exp(-(r- S ) 2 /2 7 2 ), (4) 

it is illustrative to look at limiting cases for the kernel width 7. The limit 
of 7 — > 00 implies g rs to be constant, which means that all neurons receive 
the same learning step and there is no adaptation at all. On the other hand, 
the limit 7 = gives g"J s = 5 rs which coincides with the so-called Vector 
Quantization (VQ) 3 which means there is no neighbourhood interaction at 
all. 

The interesting case is where 7 is small, which corresponds to the param- 
eter choice for the end phase of learning. Defining k := exp(— 1/27 2 ), we can 
expand (^-expansion of the Kohonen algorithm [8]) 

g? a = S rs + K(6 rs -i + Srs+i) + o(k 2 ). (5) 

Here we have written the sum over the two next neighbours in the one- 
dimcnsional case (neural chain), but the generalization to higher dimensions 
is straightforward. (Note that instead of evaluating the gaussian for each 
learning step, one saves a considerable amount of computation by storing the 
powers of k in a lookup table for a moderate kernel size, and neglecting the 
small contributions outside. Using the (k, 1, k) kernel instead of the "original" 
(1,1,1) Kohonen kernel reduces fluctuations and preserves the magnification 
law better; see [9] for the corrections for a non-gaussian kernel.) 

If we now restrict to a Travelling Salesman setup (periodic boundary ID- 
chain) with the case that the number of neurons equals the number of stimuli 
(cities), then the potential reduces to 

VwRK;TSP = ^ ~ Ws \ 2 + K Y1 \ Wr+1 - W A 2 + o{n 2 ). (6) 

r 

This coincides with the a — > limit (the limit of high input resolution, or 
low temperature) of the Durbin and Willshaw Elastic Net [TU] which also 
has a local (universal) magnification law (see section [3]) in the ID case [TTj . 
that however astonishingly (it seems to be the only feature map where it is 
no power law) is not a power law; low magnification is delimited by elastic- 
ity). As the Elastic Net is troublesome concerning parameter choice [12] for 
stable behaviour esp. for serial presentation (as a feature map model would 
require), the connection between Elastic Net and WRK should be taken more 
as a motivation to study the WRK map. Apart from ordering times and re- 
construction errors, one quantitative measure for feedforward structures is 
the transferred mutual information, which is related to the magnification 
law, as described in the following Section. 
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3 The Infomax principle and the Magnification Law 

Information theory [13j gives a quantitative framework to describe informa- 
tion transfer through feedforward systems. Mutual Information between input 
and output is maximized when output and input are always identical, then 
the mutual information is maximal and equal to the information entropy of 
the input. If the output is completely random, the mutual information van- 
ishes. In noisy systems, maximization of mutual information e.g. leads to the 
optimal number of redundant transmissions. 

Linsker [14j was the first who applied this "infomax principle" to a self- 
organizing map architecture (one should note he used a slightly different 
formulation of the neural dynamics, and the algorithm itself is computation- 
ally very costly). However, the approach can be used to quantify information 
preservation even for other algorithms. 

This can be done in a straightforward manner by looking at the magnifica- 
tion behavoiur of a self-organizing map. The magnification factor is defined as 
the number of neurons (per unit volume in input space) . This density is equal 
to the inverse of the Jacobian of the input-output mapping. The remarkable 
property of many self-organizing maps is that the magnification factor (at 
least in ID) becomes a function of the local input probability density (and is 
therefore independent of the desity elsewhere, apart from the normalization), 
and in most cases even follows a power law. While the Self-Organizing Map 
shows a power law with exponent 2 /3 [15j , an exponent of 1 would correspond 
(for a map without added noise) to maximal mutual information, or on the 
other hand to the case that the firing probability of all neurons is the same. 

In higher dimensions, however, the stationary state of the weight vectors 
will in general not decouple, so the magnification law is no longer only of 
local dependence of the input probability density (Fig. [2]). 
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Fig. 2. For D > 2 networks, the neural density in general is not a local function of 
the stimulus density, but depends also on the density in neighbouring regions. 
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4 Generalized Winner Relaxing Kohonen Algorithms 

As pointed out in '7} , the prefactor 1 /2 in the WRK learning rule can be re- 
placed by a free parameter, giving the Generalized Winner Relaxing Kohonen 
Algorithm 



5w r = r)(g]. a {v> t - w r ) 



(7) 



The S rs restricts the modification to the winner update only (the first term 
is the classical Kohonen SOM). The subtracted (A > 0, winner relaxing case) 
resp. added (A < 0, winner enhancing case) sum corresponds to some center- 
of-mass movement of the rest of the weight vectors. 

The Magnification law has been shown to be a power law [7] with exponent 
4/7 for the Winner Relaxing Kohonen (associated with the potential) and 
2/(3 + A) for the Generalized Winner Relaxing Kohonen (see Fig. [3]). As 
stability for serial update can be acheived only within — 1 < A < +1, the 
magnification exponent can be adjusted by an a priori choice of parameter A 
between 1/2 and 1. 




Fig. 3. The Magnification exponent of the Generalized Winner Relaxing Kohonen 
as function of A, including the special cases SOM (A = 0) and WRK (A = 1/2). 



Although Kohonen reported the WRK to have a larger fraction of initial 
conditions ordered after finite time [6], one can on the other hand ask how 
fast a rough ordering is reached from a random initial configuration. Here the 
average ordering time can have a minimum [16j for negative A corresponding 
to a near-infomax regime. As in many optimization problems, this seems 
mainly to be a question whether the average, the maximal (worst case), or 
the minimal (parallel evaluation) ordering time is to be minimized. 
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5 Call for Experiments and Outlook 

Cortical receptor fields of an adult animal show plasticity on a long time-scale 
which is very well separated from the time-scale of the signal processing dy- 
namics. Therefore, for constant input space distributions (which in principle 
can be measured) the magnification law could be accessed experimentally by 
measuring the neural activity (or the number of active neurons) by any elec- 
trical or noninvasive technique. Especially the auditory cortex is well suitable 
for a direct comparison of mathematical modelling due to its 1-D input space. 
Owls and bats have a large amplitude variation in their input probability dis- 
tribution (their own "echo" frequency is heard most often) and are therefore 
pronounced candidates for experiments. 

In a refined step, experimental and theoretical investigations on the non- 
linear modifications of the learning rules have to be done. On the experimental 
side, it has to be clarified which refinements to long-term-potentiation and 
long-term-depletion have to be found for the weight vectors in a neural map 
architecture. Mechanisms that can be included as modifications are modified 
winner updates (as for GWRK), probabilistic winner selection |17|18j . or a 
local learning rate, depending on averaged firing rates and reconstruction er- 
rors |19l20j . On the theoretical side, it is obvious that the same magnification 
exponent can be obtained by quite different algorithms. The relation between 
them and the transfer to more realistic models should be investigated further. 
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